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PREFACE 

Efficiency  of  storage  management  in  algorithms  which  use  arrays  is 
often  enhanced  if  the  arrays  are  stored  in  a proximity -preserving  manner, 
that  is,  array  positions  which  are  close  to  one  another  in  the  array  are 
also  stored  close  to  one  another  in  the  memory  structure.  It  has  been 
shown  that  any  scheme  that  stores  arrays  in  a linear  memory,  in  both  the 
worst  and  the  average  case,  induces  unbounded  loss  of  proximity,  but 
arrays  can  be  stored  in  binary  trees  with  bounded  loss  of  average 
proximity.  This  paper  is  devoted  to  studying  the  effect  of  introducing 
duplication  of  items  of  a square  array  A on  the  average  path  length 
between  the  images  of  any  two  records  adjacent  in  A under  a mapping  from 
A into  the  set  of  leaves  of  a complete  binary  tree.  It  is  shown  that  with 
the  appropriate  choice  of  duplications,  in  some  arrays  the  average  path 
length  can  be  decreased  by  as  much  as  127.  without  using  a deeper  tree 
than  needed  in  the  absence  of  duplication. 
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CHAPTER  1 
INTRODUCTION 


The  running  time  of  an  algorithm  is  usually  measured  by  summing  the 
execution  time  attached  to  each  operation  performed.  In  practice,  the 
running  time,  or  complexity,  of  an  algorithm  depends  very  much  on  how 
efficiently  its  data  structure  can  be  accessed.  Algorithms  which  operate 
on  arrays  frequently  access  the  array  in  accordance  with  a notion  of 
locality,  that  is,  the  array  cell  accessed  after  cell  v is  accessed  is  in 
a neighborhood  of  v.  For  instance,  the  usual  matrix  multiplication 
traverses  rows  and  columns  of  the  matrices;  and  the  Strassen's  matrix 
multiplication  [1]  accesses  matrices  in  blocks  of  four.  There- 
fore, it  is  important  to  store  the  array  in  a manner  so  that  the  proximity 
is  preserved,  that  is,  cells  that  are  close  to  one  another  in  the  array  are 
also  stored  close  to  one  another  in  the  memory  structure. 

Preservation  of  proximity  in  arrays  has  been  studied  in  [2],  [3],  and 
[4].  In  [2],  A.  Rosenberg  showed  that  any  scheme  that  stores  arrays  in  a 
linear  memory,  in  the  worst  case,  induces  unbounded  loss  of  proximity. 

In  [4],  R.  A.  DeMillo  et  al  showed  that  even  average  loss  of  proximity 
is  unbounded  in  such  schemes.  However,  they  showed  that  arrays  can  be 
stored  in  binary  trees  with  bounded  loss  of  average  proximity. 

In  this  paper,  I will  discuss  the  effect  of  introducing  duplication  of 
items  of  an  n X n array  A on  the  average  path  length  between  the  images  of 
any  two  records  adjacent  in  A under  a mapping  from  A into  the  set  of  leaves 
of  a complete  binary  tree.  Arrays  will  be  represented  as  graphs:  vertices 
represent  array  cells  containing  records,  and  arcs  represent  logical 
adjacencies. 


This  paper  consists  of  six  chapters.  In  the  next  chapter,  graphical 
models  of  arrays  with  and  without  duplications  are  defined.  In  Chapter  3, 
a binary  tree  memory  structure  is  defined  and  a storage  allocation  mapping 
is  presented.  In  Chapter  4,  a method  for  finding  the  stationary  probabili 
distribution  of  any  array  is  presented  and  the  average  stretch  between 
two  vertices  is  formulated.  In  Chapter  5,  various  duplicating  patterns 
are  discussed  and  an  efficient  duplicating  scheme  is  outlined.  Finally, 


conclusions  are  drawn. 
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CHAPTER  2 

GRAPHICAL  MODELS  OF  AN  n X n ARRAY 
2.1.  With  No  Duplication 

An  n X n array  is  a two-dimensional  data  structure.  It  consists  of  n 
rows  and  n columns  of  cells.  Each  cell  can  hold  one  record.  If  each  cell 


of  an  array  holds  a distinct  record,  then  the  array  is  said  to  be  with  no 
duplication.  In  this  section,  an  array  with  no  duplicates  will  be  modeled 
as  a directed  graph. 

A finite  directed  graph  G = (V,E)  consists  of  a finite  set  V of  vertices 
and  a set  E of  ordered  pairs  of  vertices,  the  arcs.  (v,w)  is  an  arc  directed 
from  a vertex  v to  a vertex  w and  the  direction  is  indicated  by  an  arrow- 
head on  the  arc.  A vertex  w is  called  a successor  of  a vertex  v if  there 
is  an  arc  (v,w) ; vertex  v is  then  called  the  predecessor  of  w.  The  set  of 
successors  of  a vertex  v will  be  denoted  by  Succ(v)  and  the  set  of 
predecessors  of  v will  be  denoted  by  ?red(v). 

Let  I(n)  denote  the  index  set  from  1 to  n.  Suppose  A is  an  n x n 

2 

array  with  no  duplication;  it  consists  of  n cells  c , i,j  € I(n)  where 

i » J 

i denotes  the  row  and  j denotes  the  column.  Then  the  graphical  model  G^  of 

A is  a directed  graph  (V,E),  where  V = [c.  . |i,j  € I(n)}  and 

1»  J 

E = ^Ci,j,Ci,j+l)(Ci,j+l,Ci,j),(Ci,j’Ci+l,j),(Ci+l,j,Ci,j)  € 

The  graph  G^  of  the  3x3  array  A with  no  duplication  is  shown  in  Figure  1. 


2.2.  With  Duplications 


Duplicate  copies  of  a record  can  be  introduced  into  an  array  to 

enhance  its  performance.  When  n is  not  a power  of  2,  an  n X n array  can 

be  padded  to  an  2 n"^  x 2 ^°s  n"^  array  with  duplicate  copies.  A 

systematic  scheme  will  be  described, 
s s 

A 2 x 2 array  can  be  divided  into  smaller  square  arrays  by  a set  of 
horizontal  and  vertical  lines.  These  lines  are  called  boundaries,  for 
some  j € I(s~l),  they  are  defined  recursively  as  follows: 

1.  the  (s-l)th  boundaries  divide  an  2s  x 2s  array  into  four 

o.l  c — 

2 X 2 subarrays ; 

2.  the  boundaries  divide  each  of  the  2^s”-*  ^ 2^+^  X 2^+^ 
arrays  into  four  2^  x 2^  subarrays . 

In  figure  2,  (s-l)C^  boundaries  and  (3-2)^  boundaries  of  the  2s  x 2s 
array  are  denoted  by  boldface  lines  and  dotted  lines,  respectively.  The 
iC^  boundary  is  said  to  be  higher  than  the  boundary  if  i is  larger 
than  j . 
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Figure  2.  (s-1)  (boldface)  and  (s-2)  (dotted)  boundaries  of  an  2 x 2“ 

array . 
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All  logarithms  are  to  the  base  two. 


, th 


A cell  is  said  to  be  at  the  vertical  (horizontal)  j boundary,  j > 0, 
if  it  is  in  column  (row)  2^(2i-l)  + k,  for  some  i 6 I(2S'^~^  and  some 
k 6 CO, lj.  Only  the  cells  in  columns  (rows)  1 and  2s  are  said  to  be  at 


th 


the  vertical  (horizontal)  0 boundaries . 


A finite  sequence  of  nonnegative  integers  (a^.a^,...)  is  called  a 


.th 


duplicating  pattern  if  a^  is  the  number  of  duplications  at  each  j boundary.  | 


The  duplicating  patterns  suggest  a systematic  duplication  at  the  boundaries. 


Suppose  A is  an  n x n array,  n =*  2s+t,  s ^ 1 and  1 < t ^ 2s,  and  suppose  each 


cell  c of  A contains  a record  r.  i,j  € I(n).  Then  the  extended 


i»  j 

array  A*  of  A under  the  duplicating  pattern  (a^.a^  ...  ,a  ) is  defined 

recursively  as  follows.  If  s =0,  A*  is  A.  Otherwise,  A*  is  an 

,s  . „s 


£+1  s+1  a 

2 x 2 array  consisting  of  four  2°  x 2°  subarrays  A^>  A|^  and 


A*-.  Each  A*  . is  the  extended  array  of  A,  . under  the  duplicating  pattern 

1-jJ 

(ai’a2* * * * ,as -p  and  Ai  j 13  an  n*  x n*  array>  where  n'  = (n+ag)/2,  contain- 
ing records  r^,  k=  (i-l)n'Vl,  (i-l)nM+2, . . . , (i-l)n"+n' , 

l * ( j-l)n"+l, . . . , ( j-l)n"+n' , where  n"=(n-ag)/2.  The  construction  of  the  extend 
array  of  a 5 X 5 array  under  the  duplicating  pattern  (1,1)  is  illustrated  in 
figure  3. 
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(a)  an  5 X 5 array  A 
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(b)  3 X 3 arrays  A„ 
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(e)  extended  array  A*  of  A under  (1,1) 

Figure  3.  Construction  of  the  extended  array  A*  of  a 5 X 5 array  A under 
the  duplicating  pattern  (1,1), 


L 


It  is  assumed  that  the  probability  of  accessing  a cell  c„  containing 
record  r^  in  the  array  A is  equal  to  the  sum  of  the  probabilities  of  accessing 
the  cells  containing  the  record  r^  in  the  extended  array  A*. 

The  graphical  model  of  an  extended  array  A*  of  an  2S+t  x 2s+t 
array  A under  the  duplicating  pattern  (a^,a2» . . . ,ag)  is  a directed  graph 
<V,E),  V - {c*  |i,j  € I(2s+l)J  and  E - [<c*  ,c*  ) , (c*  ,,c*  >| 

1,3  s+i  t,J  ‘••“j  t,3+1  *’Vl 

i»j  € 1(2  -1)  and  k^  is  the  smallest  integer  larger  than  j 

such  that  c*  . contains  r if  c*  , contains  r . 1.  is 

i.k^  x,y+l  i,j  x,y  j 

the  largest  integer  smaller  than  j such  that  c*  . contains 

1 , X ^ 

r . if  c.  . contains  r } 
x,y-l  i,j  x,yJ 


u f<C*,j'CS1,j),<C*l+l,j-‘:*i1+1,j,lt-J  e K2S+1-1)  and  kt  i.  the 

smallest  integer  larger  than  i such  that  eft  . contains 

9 ^ 

r . if  c*  contains  r and  Z is  the  largest  integer 
x"r  •L>y  j-  > J x , y i 

smaller  than  i such  that  c*  . contains  r , if  c*  . 

x-l,y  i,j 

contains  r } . 

x,y 

In  the  graphical  models  as  defined  above,  there  are  two  classes  of 
connections,  the  horizontal  connections  which  connect  vertices  in  the 
same  row  and  the  vertical  connections  which  connect  vartices  in  the  same 
column.  Furthermore,  all  rows  have  the  same  horizontal  co  aections 
and  all  the  columns  have  the  same  vertical  connections.  The  graphical 
model  G*  of  the  extended  array  A*  constructed  in  figure  3 is  shown  in 
figure  4. 
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CHAPTER  3 

BINARY  TREE  MEMORY  ORGANIZATION 
3.1.  Memory  Configuration 

There  is  a distinction  between  linear  memory  and  arbitrary  lists  [4] 
in  array  storage  mapping,  namely  arrays  can  be  stored  as  nonlinear  lists 
with  bounded  loss  of  average  proximity,  but  cannot  be  stored  in  a linear 
memory.  In  this  paper,  binary  tree  memories  will  be  used  as  the  storage 
structure. 

A binary  tree  memory  structure,  BTMS,  is  hierarchically  organized. 

It  is  either  a single  memory  cell  or  it  consists  of  a complete  binary  tree 
of  four  sub-BTMSs  of  equal  size.  Each  memory  cell  can  hold  one  record.  The 
level  of  a BTMS  which  consists  of  a memory  cell  is  zero,  otherwise  the 
level  of  a BTMS  is  2+  level  of  one  of  its  sub-BTMSs.  BTMSs  with  zero,  two, 
four  levels  are  shown  in  figure  5. 


A 


{ 


(a)  0 level 


□ 0 D u 

(b)  2 levels 


Figure  5.  BTMSs  with  0,  2 and  4 levels. 

In  the  figure,  the  circles  denote  linkage  cells  and  the  squares  denote 
data  cells. 
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3.2.  Array  Storage  Mapping 

The  binary  tree  memory  BTMS  is  described  in  the  last  section. 

A recursive  method  of  mapping  an  array  of  records  into  memory  cells  of  a 
BTMS  will  be  described  in  this  section. 

g 

Consider  an  n X n extended  or  non-extended  array  A,  where  n = 2 
and  divide  A along  the  (s-l)1"*1  boundaries  to  form  four  n/2  x n/2  subarrays 

All’  A12’  A21’  and  A22*  SuPP°se  A is  t0  be  stored  i-n  a BTMS  B which 
consists  of  four  sub-BTMSs  B^,  B^,  B^>  B^  from  left  to  right  at  the  second 
level  of  the  hierarchy.  Then  A^,  A^>  A21*  and  A22  are  stored 
B^,  %2’  B3>  and  B4>  respectively.  The  process  is  continued  recursively 
until  the  subarrays  are  of  order  lxl  which  results  in  mapping  a record 
into  a memory  cell. 

Figure  6 shows  the  results  of  mapping  an  4 X 4 array  A of  records 

r^j,  i,j  € 1(4),  into  the  memory  cells  of  a BTMS  with  four  levels. 

With  the  graphical  models  described  in  Chapter  2,  storing  an 
s s 

2 x 2 array  A in  a 2s-level  BTMS  is  represented  as  mapping  the  set  of 
vertices  of  the  graph  into  the  leaves  of  a complete  binary  tree  of 
height  2s. 

When  n is  not  a power  of  two,  the  storage  mapping  is  applied  to  an 
2 * 2 array  and  the  records  of  the  original  array  are  held  in 

the  first  n rows  and  first  n columns  in  the  top  left  corner  of  the  new 
array.  The  remaining  cells  are  empty. 


* * * t 


\ • * 
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CHAPTER  4 

STRETCHES  UNDER  THE  STORAGE  MAPPING 
4.1.  Definition  of  Stretch 

Under  the  storage  mapping  defined  in  the  previous  chapter,  the  image  of 
an  arc  (v,w)  in  G^  is  defined  as  the  path  from  the  image  of  vertex  v to  the 
image  of  vertex  w in  the  BTMS.  The  ratio  of  the  length  of  the  image  of  an 
arc  (v,w)  to  the  length  of  the  arc  (v,w)  is  called  the  stretch  of  (v,w) 
under  the  mapping.  All  arcs  have  length  one,  hence  the  stretch  of  an  arc 
is  an  integer  equal  to  the  path  length  of  the  image  of  the  arc.  It  will  be 

shown  in  the  following  lemma  that  the  stretch  of  an  arc  is  completely 

determined  by  the  position  of  the  arc  in  the  graph. 

Lemma  4.1. 

Let  the  vertical  (horizontal  i ) boundary  be  the  highest  boundary 

that  an  arc  e crosses  in  the  graph.  The  stretch  of  the  arc  e is  4j  + 2 

(4i  + 4). 

Proof 

Consider  a horizontal  arc  (v,w)  in  G^  (refer  to  Figure  7(a))  and  j 
boundary  is  the  highest  vertical  boundary  that  (v,w)  crosses.  Then  v 
and  w are  in  two  horizontally  adjacent  2^  X 2^  arrays  and  their  images 
under  the  mapping  will  be  in  the  left  subtree  and  the  right  subtree  of  a 
BTMS  of  level  2j  + 1.  Therefore,  the  stretch  of  (v,w)  is  4j  +2. 

For  a vertical  arc  (v,w),  v and  w are  in  two  vertically  adjacent 
2^  x 2^  arrays  if  the  horizontal  i^  boundary  is  the  highest  boundary  that 
the  arc  (v,w)  crosses.  The  images  of  v and  w under  the  mapping  are  in  the 
left  subtree  and  the  right  subtree  of  a BTMS  of  level  2i  + 2.  Therefore, 
the  stretch  of  (v,w)  is  4i  + 4.  O 


1 


j boundary 
4' 


yO\ 


t 2j  + 1 


B’  - i 


\ I 


(a)  Stretch  of  a horizontal  arc 


boundary 


(b)  Stretch  of  a vertical  arc 


A 21  21  + 2 


B*  \ / \ 


Figure  7.  Illustration  of  the  proof  of  Lemma  4.1. 


4.2.  Average  Stretch 

4.2.1.  Probability  distribution  of  the  vertices  of  the  array 
The  collection  of  sequences  of  visits  to  the  vertices  of  the  graph 
is  a stochastic  process.  We  assume  that  this  process  is  a random  walk 
[5],  that  is  the  conditional  probability  of  traversing  an  arc  originated 
at  a vertex  v is  a constant  which  is  equal  to  the  reciprocal  of  the 
cardinality  of  Succ(v) , and  the  moves  are  Independent.  Furthermore,  the 
graph  Ga  is  irreducible. 


I — » 


*A 


* • 


4-* 
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In  an  array  of  records  r „ , i>j  ^ (0,1, . . . ,n-l} , record  r^  is  even 
if  (i+j)mod  2=0  and  is  odd  otherwise.  A vertex  is  called  even  (odd)  if 
it  represents  a cell  containing  an  even  (odd)  record.  In  a graph  model 
G.  » (V,E) , the  vertices  set  V can  be  partitioned  into  two  sets:  V.  consists 

A 

of  odd  vertices  and  V consists  of  even  vertices.  At  even  time  units,  only 

2 

vertices  in  a particular  set  or  can  be  accessed,  without  loss  of 
generality  say  the  set  is  V^l  30  at  odd  time  unit3>  only  vertices  in  can 
be  accessed.  Hence,  from  the  graph  * (V,E),  we  can  obtain  two  disjoint 
graphs  Gx  » (V  ,E  ) and  G2  “ (V2»E2)  where  Ei  “ t(v»w)  lv»w  e v£  and  there  *s 
a path  from  v to  w of  length  2 in  G^}  for  i * 1,2.  Let  Pred^v)  be  the  set 
of  predecessors  of  the  vertex  v in  Gt . Each  of  G ^ and  G2  is  an  irreducible 
and  aperiodic  Markov  process.  Therefore,  we  have  the  following  lenma. 

Lemma  4.2. 

For  a given  graph  GA  » (V,E)  of  an  array  A,  there  exist  two  unique 
limiting  (steady-state)  probability  distributions  p^  and  p2  of  V such  that 
at  odd  time  units,  Pj^  satisfies  the  following  set  of  equations: 


p.(v)  » £ P.  (w)1*1,  (v|w)  for  all  v € V 

w € Pred^(v) 

p^(v)  » 0 for  all  v € 

£ P.(v)  - 1; 

v € v 1 


at  even  time  units,  p^  satisfies  the  following  set  of  equations; 


(*) 


Also  for  v,  w € V. , with  paths  of  length  2 between  v and  w in  G. , let 
Pr.(vjw)  denote  the  probability  of  reaching  v from  w in  two  time  units. 


mm 

L 


rr  t 
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P,(V) 


w £ Pred  (v) 
2 


2 P2(w)Pr2 (v|w)  for  all  v € V2 


and 


p^(v)  * 0 for  all  v € V^, 


2 p.(v)  = 1. 

v € V 


Since  a randomly  selected  time  unit  is  odd  or  even  with  equal 
probabilities,  we  can  take  the  arithmetic  mean  p of  p^  and  p2  as  the 
stationary  distribution  at  all  times.  A stationary  probability  distribution 


p of  V of  G * (V,E)  is  one  which  satisfies  the  following  equations: 
A 

p(v)  * 2P(w)Pr(v|w)  , for  all  v € V 

w € Pred(v) 


(4.1) 


and 


Lemma  4.3. 


2 p(v)  = 1. 
v 6 V 


The  distribution  p = (p^  + p2>/2  of  V is  stat  .onary. 


Proof; 


Since  p and  p are  steady-state  distribution  at  odd  and  even  time  units 
1 2 

respectively,  then 


2 P,(w)Pr(v|w)  - 2 p (w)Pr(v |w)  + 2 P,(w)Pr(v|w) 

w € V Z w € V.  2 *■ 

w € Pred(v) 


w € V2 
w € Pred(v) 


w € Pred(v) 

But  the  first  term  is  0,  since  p2(w)  » 0 when  w € and  the  second  term 


is  equal  to  p^(v),  thus 
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(i)  p (v)  =*  Z p (w)Pr(v|w) . 

1 w € V 2 

w € Pred(v) 

Similarly  we  obtain 

(ii)  p (v)  - E P1(w)pr(v|w). 

2 w € V 1 

w € Pred(v) 

Summing  (i)  and  (ii)  and  dividing  both  sides  by  2,  we  obtain 


p,(v)  + p (V)  P1(w)+P(w) 

— - Z _i L_  Pr(v|w) 

2 w € V 2 

w € Pred(v) 

which  satisfies  equation  (4.1).  ^ 

Since  it  is  assumed  that  the  conditional  probability  of  traversing 

an  arc  is  a constant,  the  stationary  probability  distribution  of  the  set 

of  arcs  can  be  easily  obtained  from  the  stationary  probability  distribution 

of  the  set  of  vertices.  The  stationary  probability  distribution  p(i), 

_ 2 

i € I(n  ),  can  be  expressed  as  the  product  of  a positive  weight  w(i)  and  a 
normalization  factor  equal  to  the  reciprocal  of  the  total  weight  of  all  the 
vertices.  Therefore,  equation  4.1  can  be  written  as  follows. 


w(i)  » Z w(j)Pr(i|j)  for  all  i € V (4.2) 

j € Pred(i) 


The  graph  G of  a square  array  A with  no  duplication  is  highly 
A 

symmetrical.  Hereafter  we  let  (i,j)  denote  the  vertex  at  the  intersec- 
tion of  row  i and  column  j in  A.  The  following  lemma  gives  the  stationary 
weight  distribution  for  the  vertices. 
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Lemma  4.4. 

The  vertices  of  the  graph  G ■ (V,E) , V ■ {(i,j)  |i,j  € I(n) } have  the 

A 

following  weight  distribution:  (Figure  8) 
w(l,l)  * w(l,n)  » w(n,l)  - w(n,n)  ■ 2 

w(i,j)  * w(i, 1)  » w(n, j)  = w(i,n)  - 3 for  all  i,j  = 2,3,  n-1 

w(i,j)  * 4,  for  all  i,j  = 2,3,  n-1 

and  the  normalization  factor  is  4n(n-l) . 

Proof; 

It  is  required  to  show  that  for  every  vertex  (i,j)  € V,  equation  (4.2)  is 

satisfied  by  the  weight  distribution  given  in  the  lemma. 

For  (i, j)  - (1,1),  (n, 1) , (l,n)  and  (n,n) , w(i,j)  *3  • - + 3 • \ - 2; 

3 J 

so  the  equation  is  satisfied.  For  other  vertices,  it  can  be  showed 

analogously  that  the  equations  (4.2)  are  satisfied. 

The  normalization  factor  is  the  sum  of  all  the  weights,  E w(i,j), 

j € I(n) 


which  is  equal  to  4n(n-l) . 


□ 


Figure  8.  Weight  distribution  of  an  array  with  no  duplication 


1 
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From  Lemma  4.4  and  the  definition  of  weight  distribution,  we  have  the 
following  theorem. 

Theorem  4.1. 

Let  A be  an  n x n array  with  no  duplication.  Then  the  vertices  in 
set  V * £(i, j) | i, j € I(n)}  of  the  graph  G has  the  following  stationary 

Ci 

probability  distribution; 

p(l»j)  ■ P(i»l)  * 2/0  for  i, j - l,n, 

P(l,j)  * P(n,j)  = p(i,l)  - p(i,n)  = 3/0  for  i,j  =»  2,3,...,n-l, 


p(i» j)  - 4/0  for  i, j ■ 2,3,... ,n-l , 
where  0 = 4n(n-l). 

Corollary  4.1 

The  arcs  of  E iic.G^,  with  no  duplication  are  equiprobable . 
Proof: 

For  each  arc  (v,w)  6 E, 


Pr((v,w))  = p(v)  • Pr(w|v) 

f l 

and  Pr(wjv)  * 2 if  v is  at  positions  (1,1)  ,(l,n)  ,(n,l)  , or  (n,n) 

k if  v is  at  position  (1, j) ,(n,j) , (i,l),(i,n)  for 

2 ^ i , j < n-1 

i 1 

*■  7 otherwise. 

4 

Thus,  by  Theorem  4.1,  Pr((v,w))  » ^ for  all  edges  (v,w)  € E. 

The  graph  G*  of  the  extended  array  A*  of  a square  array  A is  more 

A 

complicated.  The  following  lemmas  are  established  for  finding  the 


□ 


stationary  probability  distribution  of  the  set  of  vertices  of  G*. 


Lemma  4 . . 


For  a graph  G*  - (V,E),  V =■  C(i,j)|i,j  € I(2S+1)}  of  a 2S+1  x 2S+1 
extended  array  A*,  we  have  the  following  relations  among  the  terms  of  w, 
the  stationary  weight  distribution  of  V,  (Figure  9) 


f w(j,i) 

s+1 

w( j ,2  + 1 - i) 

S+l 

j w(i,2  + 1 - j) 

[i, j)  * J w(2S+1  + 1 - i,  2S+1  + 1 - 

,,s+l  s+l 

w(2  + 1 - j,  2 + 1 - 

s+l 

: w<2  + i - j,D 

s+l 

' w(2  + 1 - i,  j) 


for  i,j  6 I(2S) 


Proof: 

Observe  that  the  graph  Gf  is  symmetrical  (by  definition)  about  the 
th  th 

horizontal  s boundary,  the  vertical  s boundary,  and  the  main  diagonal. p 


2s+1+l-i 


2S+1+1- 


2s+1+l-i 


2S+1+l-j 


U 


Figure  9.  Symmetries  of  the  weight  distribution  (the  vertices  at  the 
marked  position  have  identical  weight). 


From  Lemma  4.5,  we  only  have  to  find  the  weights  of  the  vertices 
(i,j)  for  i » 1,2, ...,2s  and  j » 1,2, ...,i,  (the  shaded  triangle  in 

S+l  2 

Figure  9)  rather  than  those  corresponding  to  all  (2  ) positions. 

Lemma  4.6. 

Let  w(i,2  ) and  w(2  ,j),  1 < i,  j <2  , be  given.  The  set  of  weights 
w(l,l)  - | w(l,2S)  • w(2S,l) 

- w(i,2S)  • w(2s, j)  for  1 < i,j  < 2S  and  (i,j)  + (1,1), 
and  w(i, j)  as  in  Lemma  4.5  for  i or  j > 2s  satisfies  equation  (4.2) 
(refer  to  Figure  10). 

Proof: 

Observe  that  w(2s,j)  - w(j,2s)  due  to  the  svmmetry  of  the  graph  G* 

A 

about  the  diagonal.  Assuming  (i,j)  i4  (1,1),  it  is  required  to  show  that 
w(i,j)  ■ w(2S,j)  satisfies  equation  (4.2),  which  is 

w(i,j)  - Z w(i,k)  ?r((i,  j)  | (i , lc) ) + Z w(k,j)  Pr((i,  j)  | (k,  j))  . 
(i,k)€Pred(i, j)  (k, j)€Pred(i, j) 

Recall  that  all  the  rows  have  the  same  horizontal  connections  and  all  the 

columns  have  the  same  vertical  connections  in  G*  and  let  C.  be  4 if 

A 13 

i € [l,2S+1}  and  1 otherwise.  Hence,  the  right  hand  side  of  the  equation 
can  be  expressed  as 

w(2S,i)  Z w(2S,k)-Ct*Pr((2S,j) |(2s,k)) 

(2S,k)€Pred(2s,j) 

+ w(2S,j)  Z w(k,2*)CjPr((i,2S)  |(k,23)).  ( 

(k,2s)€Pred(i,23) 
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Since  w(2S, 2s)  « Z w(2s ,k)Pr( (2s ,2s) | (2s ,k) ) 
(2S,k)€Pred(2S,2S) 

+ 2 w(k,23)Pr((23,2S)|(k,29)) 

(k,2s)€Pred(2s,2s) 


and  (due  to  the  symmetry  around  the  main  diagonal)  the  two  terms  in  the 

right  side  are  equal,  w(2s,2s)  - 2 Z w(k,2S)Pr((2S,2S) |(k,2s)) . 

(k,2s)€Pred(2s,23) 

Furthermore,  the  definition  of  w(i,j)  implies  that  w(2s,2S)  ■ 1.  Letting 
i * 2 in  the  expression  (*)  we  have. 


w(2S,j)  > w(2S,2S)  Z w(2S,k)C  3Pr((2S,j) |(2S,k)) 

(2S,k)€Pred(2s,j)  2 

+ w(2S,j)  Z w(k,23)C.  Pr((2S,23) |(k,23)) 

(k,23)€Pred(23,23) 


Hence , 


w(2S,j)  - 1/(1-C  /2) 


2 w(2s,k)Pr((2s,j)|(2s,k)) 

(2S,k)€Pred(23, j) 


Substituting  the  identity  just  obtained  into  expression  (*) , we  have 

Jw(2S,i)  • w(2S,j)  for  ISi.jS  23  and  (i,j)  + (1,1) 
v(i,j)  -<a 


| w(23, 1)  • w(23 , 1)  for  i,j  - 1. 


Z X -Z  |X 


Z *Y  i Z *Z  i I Z 


Figure  10.  Weight  distribution  in  upper  left  quadrant  of 

„ 0s+l  v -s+1 
a 2 x 2 array. 


From  Lemmas  4.5  and  4.6,  we  only  have  to  find  the  weights  w(2  ,j), 

£ 

denoted  as  W(j),  for  j ■ 1,...,2  , rather  than  those  corresponding  to 

g+1  ^ 33 

all  (2  ) positions.  For  an  extended  array  of  an  (2  +t)  x (2  +t)  array 

under  the  duplicating  pattern  (a^.a^, . . . ,ag) , the  weights  w(j)  will  be 

found  by  a sequence  of  s iterations.  In  the  initial  step,  all  a^  except 

3 

a are  assumed  to  be  zero  and  we  find  the  weights  w (j)  for  the  pattern 
s 

(0,0, . . . ,0,ag) . In  the  next  step,  we  extend  the  pattern  to  include 

a and  f ind  tf3 ” ^ ( j ) , for  (0,0, ...,0, a .,a  ),  in  terms  of  W3(j). 

This  process  is  continued  until  the  pattern  is  extended  to  including 

all  a^'s:  at  this  point,  the  weights  w*(j)  are  the  w(j),  for  j ■ 2,. ..,2 
— 3 1 _ i 

and  w(l)  ■ ^ (1).  The  weights  w (j)  are  generated  in  each  iteration  i 

as  follows.  Initially,  we  have 


24 


. 


1 

■ 
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For  example,  s = 4,  t = 1 and  (a^^.a^a^)  = (0,1, 1,9),  the  weights  are 

s t 

generated  as  follows.  (Notice  that  there  is  no  duplication  at  the  1 

—2  —1 

boundaries,  therefore  w (j)  ■ w (j).) 


{ 


I 

I 

I 


w4(j): 

10 

10 

10 

10 

9 

8 7 

I 

6 

5 

4 

3 

2 

1 

w3(j): 

10 

10 

10 

10 

9 

•RR 

6 

5 

4 

1 

3 

2 

1 

w2  (j) : 

10 

10 

10 

FR 

9 

8 4 3 

6 

5 

f! 

T 

_2_ 

2 

1 

^(j): 

10 

10 

10 

5 5 

9 

8 4 3 

6 

5 

5 

2 

3 

2 

2 

1 

w(j)  : 

15 

2 

10 

10 

5 5 

9 

8 4 3 

6 

5 

5 

2 

3 

2 

2 

1 

The  set  of  weights  w^(j)  obtained  at  each  iteration  i and  w(k,2), 
for  1 ^ k,l  ^ 2m^  given  in  Lemma  4.6  satisfies  equation  (4.2). 

Theorem  4.2. 

The  procedure  in  Algorithm  4-1  correctly  generates  the  weight 

distribution  w(j),  for  j 6 I(2S)  for  the  extended  array  of  a(2s+t)  X (2s+t) 

array  under  the  duplicating  pattern  (a. a ). 

1 s 

Proof: 

See  Appendix  A.  □ 

Theorem  4.3. 

Let  A*  be  the  extended  array  of  a (2S+t)  X (2S+t)  array  A under  the 

duplicating  pattern  (a. ,a„ , . . . ,a  ).  The  normalization  factor  Z w(i,j), 

3 

for  a weight  distribution  w of  a graph  G*  * (V,E),  V * {(i,j)|i,j  €l(2S  )} 

is  equal  to  (2S+ 1)  (2S+t-l)  (a  +1)^. 

s 


*' 
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Prop  f : 

Observe  that 


( i 2 

2 w(t,  j)  -x  (a  +1)  if  r is  at  the  corner  of  A 
V(i,j)  contains  8 


record  r 


3 2 

7 (a  +1)  if  r is  at  the  lowest  boundary 

4 S 


+1)  otherwise.  □ 

s s 

The  limiting  probability  distribution  p of  the  vertices  of  a graph 

G*  can  be  obtained  by  Lemmas  4.5  and  4.6;  and  Theorems  4.2  and  4.3. 

A 

Algorithm  4.1. 
begin 

for  j £ I(2S)  and  i 6 I(23),  set  dj  and  X^(j)  to  zero; 
set  ag  to  zero; 

for  i 6 I((2S+t)/2),  set  ^S(i)  to  ag+l; 

[£j  is  the  number  of  distinct  records  between  a jth  boundary 
and  the  nearest  boundary  which  is  higher  than  jC^] 


Vi  ■ ” 

for  j = 1 to  s do  ij  ■ (^j+1  + 


j = s; 

LP:  while  j ^ L do 
begin 

for  i » 1 to  2s step  2 do 
begin 


FPG(1  + (i-l)Xj.j) 


Procedure  BWG(f,j): 
begin 


for  1 ■ f to  f + a • 1 do 

^(i)  - ^+1(i  - (dj+Daj)  - wl(i-aj); 
for  i ■ f + a^  to  f + X - 1 do 


begin 


^(i)  =wj+1(i  - (d+l)a  ); 
Xj(i)  - Xj+1(i  - (d  +l)a  ); 

end; 

X*(f  + a - 1)  - ^(f  - 1); 


end: 


end. 


28 


I 


4.2.2.  Calculation  of  average  stretch 

The  average  stretch  of  the  arcs  of  a graph  G*  * (V,E)  is  defined  as: 

2 Pr(e)  • Stretch  of  e. 
e € E 

It  can  be  expressed  as 

2 2 p(v)  • Pr(w|v)  • stretch  of  arc  (v,w) 

v € V w € Succ(v) 

Since  the  conditional  probability  Pr(w|v)  is  a constant  l/|succ(v)| 
for  all  w € Succ(v),  the  average  stretch  is 


Let  S(i, j)  be  the  sum  of  the  stretches  of  the  arcs  originating  at  a 
particular  vertex  w which  is  at  the  horizontal  ith  and  the  vertical  jth 
boundaries , that  is 

S(i,j)  " 2 stretch  of  (v,w) 

w € Succ(v) 

In  the  following  lemma,  it  will  be  shown  that  S(i,j)  is  a function 
of  i and  j . 

Lemma  4.7. 


S(i,j) 


r 12  + 4i  + 4j 
i 8 + 4j 
l 10  + 4i 


l 6 


for  i.j  > 0 
for  i * 0 and  j > 0 
for  i > 0 and  j = 0 
for  i ■ j ■ 0 


Proof: 

In  Figure  11  it  is  observed  that  there  is  a horizontal  arc  and  a 
vertical  arc  directed  from  a vertex  v at  the  horizontal  ith  boundary 
and  at  the  vertical  j1"*1  boundary,  and  these  boundaries  are  the  highest 
ones  which  these  arcs  must  cross.  If  j (i)  is  greater  than  zero,  there 
is  an  additional  vertical  (horizontal)  arc  directed  from  v and  the 
horizontal  (vertical)  boundary  is  the  highest  boundary  that  this  arc 
crosses.  By  Lemma  4.1,  the  lenma  follows. 


Figure  11. 


Arcs  directed  from  a vertex  at  the  horizontal  iC^  and  the 
vertical  j1"*1  boundaries. 


- ■ ■■ 
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By  Theorem  4.3  and  equation  (4.3),  and  the  Lemmas  4.5  and  4.6, 
the  average  stretch  ES  of  a graph  G*  where  A*  is  the  extended  array  of 
a (2S+t)  x (2S+t)  array  under  the  duplicating  pattern  (a^.a^, . . . .a^) 
can  be  expressed  as  follows. 


ES 


(2S+t)(2S+t-l)(as+l) 


1 <as+1)  13  13  1 

2(1  * — *5 3(0.°)  + J * |(V1)F  + 3 * 4(V1)H  + 5G) 


(4.4) 


where  F * Z W.  S(i,0), 

i € I(s)  1 

H * Z W S(0,i), 

i € I(s) 


G * Z W Z W S(i,j) 
i 6 I(s)  j 6 I(s)  J 

and  * the  cumulative  weights  of  the  vertices  at  the  vertical  i 


th 


boundaries  of  the  row  2 

i 


j € I(2s-l-i) 


(w(2i+(j-l)2i+1)  + w(2i+(j-l)2i+1+l)), 
for  i € I(s-l) 


We  are  going  to  use  the  following  lemma  on  the  inequality  of  sums  of 
two  sequences  of  real  numbers  to  show  how  the  average  stretch  depends  on 

the  weight  distribution  derived  from  some  duplicating  pattern  (a^ a^). 

Lemma  4.7. 

For  two  sequences  of  real  numbers  (b1,...,b  ) and  (c  ,...,c  ), 

■L  S 1 3 

If  He,  < Hb  and  Ic  = Z b 
i 1 i 1 i 1 i 1 

then  Z c Z c (i  + j)  < Z b Z b (i  + j) 
i j J i j J 
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Proof: 


Z c E c (i+j)  = Sic  Ic  + Ec  Z j c 

i j J i i j J 

» Zic  Z b + E b Ej  c 
i'  j J i 1 j J 

Z 
j 


< Zibi  E b.  + E b.  Ej  b. 


i iJJ  j 


Z b Z b (i+j). 
i j J 


Theorem  4.4. 


WJ 


WV 


If  2i  a < 2i  > where  W|  and  WV  are  the  cumulative  weights 

i s i s 

til 

at  the  i boundaries  of  G^A*) i and  G(a*)"  respectively,  (A*)'  and  (A*)" 

are  extended  arrays  under  (a'  . . . ,a')and  (a",..., a")  respectively,  then 

i S 1 s 

ES'  < ES". 


Proof: 


ES 


W 

i as 


»f 

i ~sH 


w w w • 

noh^6*2  ITI<18+8i>  + 2 ITI  2 ITT(12+4i^j)3 


w< 

n 

s 


ES'  - ES" 


w:  w"  w!  w: 

(8(*  ITT1  •2aTTi)  +18(?irr-  2Ha) 


nCn-i)  i-g--  i -s-*  i “s*  i “s 

W!  W,  WV  WV 

+ 12<*  ra * rir  - 1 

is  j s is  js 

W'  W'  W"  W" 

+ 4<J  to  2 rir  <i+J>  - z,  2 

isjs  isjs 

W!  W'.' 

we  can  show  that  Z = Z yiiyy  “ y - 1 and  by  the  hypothesis, 

is  is 


W!  WV 

2i  ITT  < 2i  ITT  * 

is  is 


Therefore,  it  follows  from  Lemma  4.3  that 


ES'  - ES"  < 0. 


r:r  i 'T"  »t* 

_ 


Since  the  set  of  arcs  in  a graph  with  no  duplication  is  uniformly 


distributed,  the  average  stretch  ES  is  the  total  stretch  IS  of  all 


arcs  divided  by  the  number  of  arcs.  The  total  stretch  TS  can  be 


number  of  vertices  at  the 


boundary  and 


horizontal  i 


boundary 
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CHAPTER  5 

DUPLICATING  PATTERNS 

5.1.  Feasible  Duplicating  Patterns 

It  has  been  mentioned  briefly  that  a duplicating  pattern  is  a 
sequence  of  nonnegative  integers  indicating  exactly  the  number  of 
duplications  at  each  boundary.  It  is  obvious  that  the  duplicating  pattern 
cannot  be  any  arbitrary  sequence  by  observing  that  there  are  constraints 
both  on  the  amount  of  duplication  and  the  number  of  boundaries.  A feasible 
duplicating  pattern  is  defined  precisely  as  follows; 

s 

Let  A*  be  the  extended  array  of  an  n x n array,  where  n = 2 + t, 

g 

for  some  t € 1(2  -1),  under  the  duplicating  pattern  (a. ,a  , ...,a  ). 

i z s 

Then  (a  ,a  ,.,.,a  ) is  a feasible  duplicating  pattern  if  a.'s 

1 ^ S 1 

satisfy  the  equation 

2 a. 2s'1  - 2S  - t (5.1) 

i-1  1 

2 — i.  g 

where  2 is  the  number  of  vertical  i boundary  in  a row  and  2 - t 

is  the  maximum  number  of  duplications  in  a row. 

We  are  only  interested  in  feasible  duplicating  patterns;  thus,  for 
brevity,  a duplicating  pattern  means  a feasible  duplicating  pattern.  In 
the  remaining  sections  of  this  chapter,  the  effect  of  duplicating  patterns 
on  the  average  stretch  will  be  investigated. 

5.2.  Some  Special  Duplicating  Patterns 
5.2.1.  No  duplication 

For  a graph  * (V,E)  of  an  n X n array  A with  no  duplication,  all 
the  arcs  in  E are  equally  probable  (refer  to  Corollary  4.1).  Therefore, 
the  average  stretch  will  be  the  total  stretch  divided  by  the  cardinality 

g 

of  E.  When  n is  a power  of  2,  let  n - 2 . The  total  stretch  TS  is 


defined  by  the  following  recurrence: 

TS(1)  - 0 

TS(n)  * 4 TS(n/2)  + 2n(2  log((n/2)2)  + 2)  + 2n(2  log((n/2)2)  + 4), 

where  the  first  term  on  the  right  hand  side  of  the  equation  is  the  total 
stretch  of  arcs  within  each  of  the  four  n/2  X n/2  subarrays  (refer  to 
Figure  12),  the  second  term  is  the  stretch  of  all  the  arcs  crossing  the 
vertical  boundary,  and  the  last  term  is  the  stretch  of  all  arcs 
crossing  the  horizontal  s^  boundary. 


horizontal  s 
boundary 


th 


4- 

m 

TS  (n/2) 

$ 

ft***  f 

TS (n/2) 

u i 

TS (n/2)  ‘ 

ft 

<- 

i > ' 

TS (n/2) 

j 

„ i 

-J*  i 

r j 

n/2 


V n/2 


vertical  s boundary 


Figure  12.  Illustration  of  the  recurrence  of  the  total  stretch 

The  solution  to  this  recurrence  equation  is 
TS(n)  ■ 28n2  - 16  nlogn  - 28n 

There  are  4n(n-l)  edges  in  E,  thus  the  average  stretch  ES  is 


ES 


28n^  - 16  nlogn  - 28n 


4nz  - 4n 


which  is  asymptotically  equal  to  7 as  n approaches  infinity. 


T 


When  n > 2 is  not  a power  of  2,  there  is  no  simple  recurrence  for 
solving  the  total  stretch  TS  since  the  storage  mapping  is  not  symmetrical 
about  the  sth  boundaries  of  the  array.  Equation  (4.5)  has  to  be  used  and 
m , the  number  of  vertices  at  a horizontal  ith  boundary  and  a vertical 

1 9 J 

j boundary,  is  given  below 

mi  ^ - XjX^  for  i,j  € I(s),  n - 2S  + t,  1 ^ t ^ 2S 

m0,0  " 1 


Xi  “ 2S  + 2 — — + y±  for  i € l(s-l) 


where  y±  - f -1  if  --~+f  — (2+21+1)  - 21+1  > t - 2l  + 1 


0 otherwise 


and  X. 


kl  k2 


Thus,  ES  * 2 2 m.  . s(i,j)  - n*(8k-6)  where  k * k.  if  t*2  +2  +...+2 

i*0  j=0  & 


k^  > k^  > . . . > k^.  Instead  of  simplifying  the  expression  on  the  right  hand 
side,  which  requires  some  tedious  algebraic  manipulations,  we  evaluate  ES 
for  n ■ 2s  + 1,  2s  + 2S"\  2s  + 2S  - 1.  The  numerical  results  are  shown  in 
Table  1.  From  these  results,  we  can  say  that  for  these  particular 
o*2  + 1,  2 + 2s  \ 2S  + 2S  - 1,  ES  is  asymptotic  to  7.  Since  these 

values  of  n are  the  two  extremes  and  the  median  of  the  interval  comprised 
between  two  consecutive  powers  of  2,  it  is  resonable  to  say  that  ES  is 
asymptotically  equal  to  7,  for  all  integer  n.  This  value  will  be  used  as 
reference  to  measure  the  improvement  of  ES  when  duplication  is  introduced. 

In  the  following  subsections,  three  special  duplicating  patterns  will  be 


investigated. 
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5.2.2.  (0,0, ... , 2s-t) 

When  n is  not  a power  of  2,  there  are  many  ways  to  extend  an  n x n 

array  to  a 2 ^lognl  x 2 array.  Arcs  whose  images  cross  a high 

boundary  correspond  to  a large  stretch  under  the  storage  mapping,  which 

contributes  heavily  to  the  average  stretch.  Thus,  we  would  want  to  avoid 

using  arcs  which  cross  the  highest  boundaries.  This  can  be  done  by  placing 

duplicates  of  those  records,  which  these  arcs  are  trying  to  reach  across 

the  highest  boundaries,  in  the  same  subarrays  from  which  these  arcs  are 

s 

directed.  The  first  duplicating  pattern  (0,0,. ..,0,2  -t)  that  we 
investigate  for  an  (2s+t)  X (2s+t)  array  is  doing  exactly  what  we  have 
just  described. 

Algorithm  4.1  can  be  used  to  generate  the  weight  distribution  and 
then  equation  (4.4)  is  applied  to  obtain  the  average  stretch  ES(n).  For 
the  pattern  (0,0, . . . ,2s -t) , the  algorithm  4.1  gives  the  following  weight 
distribution 

i = 1 2 ...  t+1  t+2  2s 

w(i)  - |(2S-t+l)  2S-t+l  2S-t+l  2S-t  1 

— s-1  s 

For  t » 1,2  ,2  -1,  closed  form  expressions  for  the  W^'s  used  in  equation 

(4.4)  can  be  easily  obtained  from  the  weight  distribution  (refer  to 

Appendix  B).  Numerical  results  are  used  to  study  the  behavior  of  ES. .. 

A* 

The  improvement  in  ES.  . with  duplication  can  be  measured  in  terms  of  the 

A* 

ES  - ES 

figure  of  merit  — x 100%.  Table  2 shows  the  average  stretch 
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ESa*  (n)  and  their  improvements  for  n = 2s+l,2s+2s'1,2s+2s-l,s  € {2 ,3, . . . ,34} . 
The  improvements  diminish  rapidly  to  zero  as  n increases.  Therefore 

g 

(0,0, ... ,2  -t)  is  certainly  not  a good  duplicating  pattern. 

s s 

There  is  nothing  that  can  be  done  for  n * 2 +2  - 1,  since 

(0,0, . . . ,0, 1)  is  the  only  feasible  duplicating  pattern.  There  is  more 

g«|.  ^ 

freedom  of  choice  of  duplicating  patterns  for  n < 2 - 1,  and  this  is 

g 

maximum  for  n * 2 +1.  Therefore,  the  remaining  two  subsections  will  be 

g 

dedicated  to  the  study  of  the  case  n = 2 + 1,  in  order  to  learn  the 

effect  of  the  choice  of  a^'s  on  the  average  stretch. 

5.2.3.  i0;l^_^.,._l^_3_-^l) 

We  have  seen  that  duplicating  only  at  the  highest  boundaries  is  not  a 

good  strategy.  In  this  section,  a more  uniform  distribution  of  a^'s, 

s * 1 s 

namely  (0, 1, 1, . . . , 1,2  +1),  for  n =»  2 + 1,  is  investigated.  This 

duplicating  pattern  allows  one  duplication  near  each  boundary  except  the 
lowest  and  the  highest  boundaries.  It  allows  no  duplication  near  the 
lowest  boundaries  and  gives  whatever  number  of  duplications  remained  to 
the  highest  boundaries. 

Observing  the  weight  distribution  w(i)'s,  closed  form  expression 

of  W^'s  in  terms  of  n and  i are  obtained  after  some  long  algebraic 

manipulation;  they  are  shown  in  Appendix  B.  These  W^'s  are  substituted 

into  equation  (4.4)  to  calculate  the  average  stretch  ES. . . The  results 

A** 

show  that  for  a large  value  of  n,  ES^  is  approximately  6.33  (and  this 
appears  to  be  the  asymptotic  value)  which  produces  an  improvement  of 
about  9.57..  When  n is  smaller,  the  improvement  is  larger  (Table  3). 
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For  n ■ 2S  + 1,  (0, 1, 1, . . . , 1,2s  *>1)  is  superior  to  (0,0, . . . ,0,2S-1) . 

S s+1 

For  arbitrary  2 + 1 < n < 2 - 1,  it  is  also  expected  that  duplication 

at  lower  boundaries  is  desirable:  indeed,  the  number  of  low  boundaries  is 
very  large,  which  corresponds  to  a high  frequency  of  low-boundary 
crossing  in  the  random-walk  hypothesis  we  have  made. 

5.3.4.  (0, 1,2, 4, 4, ... ,4, 7) 

s * 1 

We  have  seen  that  (0, 1, 1, . . . , 1,2  +1)  is  a better  duplicating  pattern 

s s 

than  (0,0,..., 0,2  -1)  for  n * 2 +1.  Are  there  patterns  which  are  still 

better?  In  this  section,  this  question  is  answered  by  showing  that 
(0, 1, 2,4,4, . . . ,4, 7)  , for  n * 2s  + 1,  s =>  5,6,...,  is  better. 

Algorithm  4.1  is  used  to  generate  the  weight  distribution  w(i)'s  for 
s ■ 5 and  the  W^s  used  in  equation  (4.4)  are  calculated.  Since  a^s, 
i * 4,5,...,s-l,  have  the  same  value,  W^'s  for  s = 5 is  easily  extended 
to  those  for  s > 5.  The  closed  form  expressions  for  these  W^'s  are  given 
in  Appendix  B. 

Table  4 shows  that  for  large  values  of  n,  approaches  6.13 

which  gives  an  improvement  of  12.4%.  This  duplicating  pattern  is  a 
combination  of  the  heuristic  criteria  presented  in  the  previous  two 
sections,  namely  (i)  crossing  of  high  boundaries  contributes  heavily  to 
the  average  stretch  and  (ii)  low  boundaries  are  frequently  crossed. 
Therefore,  this  pattern  suggests  that  there  are  duplications  at  higher 
boundaries  and  at  the  same  time  there  are  duplications  at  lower 
boundaries . 
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Average  stretch  and  improvement  I under  (0,1, 2, 4, 4 


This  pattern  seems  to  be  quite  good.  Is  it  optimal?  For  arbitrary  n, 
what  is  a good  duplicating  pattern?  These  two  questions  will  be  examined 
in  the  remaining  section  of  this  chapter. 

5.3.  Optimal  Duplicating  Patterns 

An  optimal  duplicating  pattern  for  an  n X n array  A is  one  under  which 
A is  extended  to  A*  and  is  smallest  among  all  ES^,  where  A*  is  an 

extended  array  of  A under  some  duplicating  pattern. 

An  optimal  duplicating  pattern  can  be  found  by  exhaustive  search. 

The  algorithn  is  given  as  follows. 

Algorithm  5.1. 

3 

Input:  An  n x n array  A,  where  n * 2 + t 

Output:  An  optimal  duplicating  pattern  for  A. 

Method: 

1.  Call  the  procedure  FG  (s,n)  to  generate  the  set  DP  of  all 
feasible  duplicating  patterns  for  A. 

2.  For  each  pattern  in  DP,  use  algorithm  4.1  to  obtain  the 
probability  distribution  of  the  set  of  vertices  in  the 
graph  of  the  extended  array  A*  of  A.  Then  apply  equation 
(4.4)  to  obtain  the  average  stretch  ES^. 

3.  Output  a duplicating  pattern  d € DP,  such  that  A*  is  the 
extended  array  of  A under  d and  ES^+  < ES^. 


I 
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} 

flb 
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Procedure  PG(s,t) 

DP  - 0; 

a - 2s-t;  a - 0;  l - 1,2,. ...s-l 
8 1 

j - 3-2; 
while  j > 0 do 
begin 

j - s-2; 


while  a ^ 0 do 
s 


begin 


add  (a1,a2> ...,as)  to  DP; 

a . - a . + 1; 
s-1  s-l 


a — a - 2; 
s s ’ 


end; 

repeat 


if  j ^ 0 then  a^  - a^  + 1 

for  i •“  j+1  until  s-l  do  a.  - 0; 

s j s-l 

a - 2s  - t - E 2s  1 a 
s . . i 

i*l 

if  a <0  then  j *-  1 - 1 
s J 


until  a ^ 0; 
s 


end; 
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5.4.  Near  Optimal  Duplicating  Patterns  for  n » 2s  + 1 

For  large  value  of  s and  small  value  of  t,  there  are  so  many 
duplicating  patterns  that  it  becomes  impractical  to  obtain  an  optimal 
duplicating  pattern  for  each  n by  algorithm  5.1.  It  is  preferable  to 
have  a general  duplicating  scheme  which  guarantees  small  average  stretch 


r * % t ‘ 

.A 


I 

I 


— 
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for  every  n.  The  two  heuristics  used  to  construct  an  efficient  duplicating 
scheme  are  as  follows: 

1)  An  array  composed  of  smaller  arrays  with  good  structure 
has  a good  structure; 

2)  If  W ,'s  are  small  on  the  lower  end  and  large  on  the  higher  end 

1 s W 

then  Si  — is  small,  which  implies  small  average  stretch. 
i»l  C 

These  will  be  explained  and  justified  in  more  detail  in  this  section. 
Theorem  5.1. 

If  A is  an  (2S+1)  x (2S+1)  array,  s is  an  integer  greater  than  or 

equal  to  6 and  the  duplicating  pattern  is  (0,1, 2, 4 4,ag  1>ag) 

then  the  minimum  value  of  ES^  is  achieved  when 

(as-l’as>  " (4’7)  if  6 ~ 3 “ 9 
(agl,ag)  - (5,5)  if  s * 10. 

Proof:  (by  exhausting  all  the  possible  combinations  of  (as_i»a3)) 
s 

Since  2 a^  2 =2-1  and  (a^,a2> . • • >ag_2^  = (0, 1,2,4, . • . ,4) , 

i-1 

2a  . + a must  be  equal  to  15.  Thus,  there  are  8 possible  pairs  of 
s-1  s 

(a  . ,a  ).  They  are  (i,15-2i)  for  i - 0,1,. ..,7.  Recall  that  the  Wt's 
S-1  s 

Cll 

are  the  cumulative  weights  at  i boundaries.  For  each  (ag _^,ag) 
compute  the  difference  D^,  j * 0,1,..., 7 

. »‘4>  . «<J)  ,,, 

n a Z i -i Z i - where  W,  are  the  cumulative  weights  when 

J i-1  5 i-1  V1  1 

when  (a  a ) - (j,15-2j).  The  closed  form  expressions  for  these 
shown  in  the  appendix  B are  used  in  computing  . The  D..  are: 


1. 
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t 277/60  - 123/80  • s for  j = 0 

89/28  - 433/280  • s for  j - 1 

313/360  - 27/120  • s for  j * 2 
23/120  - 3/40  • s for  j = 3 

-29/120  + 3/120  • s for  j = 5 

-151/280  - 3/280  • s for  j = 6 
^ 17/40  - 9/40  • s for  j ■ 7 

For  s ^ 6,  Dj  is  negative  when  j = 0,1,. ..,3,6,7  when  j « 5,  is 
negative  for  6 < s ^ 9 and  is  positive  when  s s io.  Hence,  the 
theorem  follows  from  Theorem  4.5.  □ 

Since  is  0(log(n-l)),  the  difference  in  the  average  stretch  between 
the  patterns  (0, 1,2,4, . . . ,4, 7)  and  (0, 1,2,4, . . . ,5 ,5)  is  0(l/n)  which 
approaches  zero  when  n is  large.  Numerical  results  show  that  for  s s 24, 
the  average  stretches  for  both  patterns  are  the  same. 

Using  algorithm  5.1,  the  optimal  duplicating  patterns  for 
n * 2s  + 1 and  s ■ 3,4, 5, 6 are  found  to  be  (0,1,5),  (0,1,2, 7), 

(0, 1,2,4, 7)  and  (0, 1,2, 4, 4, 7)  respectively.  Thus,  the  duplicating 
pattern  (0, 1, 2,4, . . . ,4, 7)  is  inductively  an  efficient  pattern.  With  the 
evidence  of  Theorem  5.1,  a reasonable  conjecture  is  that  (0, 1,2,4, . . . ,4, 7) 

g 

is  near  optimal  for  n » 2 4-1. 

When  n is  not  equal  to  2s  + 1,  (0, 1,2,4, ... ,4,7)  certainly  cannot  be 
an  optimal  duplicating  pattern  since  it  is  not  a feasible  duplicating 
pattern.  Further  research  is  needed  in  order  to  obtain  some  near  optimal 

g 

duplicating  schemes  for  n ^ 2 +1. 
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CHAPTER  6 
CONCLUSION 

Any  scheme  that  stores  arrays  in  a linear  memory  induces  unbounded 
average  loss  of  proximity.  DeMillo,  Eisenstat  and  Lipton  [4]  have 
shown  that  when  square  arrays  are  stored  in  a binary  tree  memory  structure, 
the  average  loss  of  proximity  is  bounded  by  a constant  7.  In  this  paper, 
duplication  of  items  of  an  array  is  introduced  to  decrease  the  average 
loss  of  proximity  in  arrays  when  they  are  stored  in  a binary  tree  memory 
structure.  It  is  shown  that  some  duplication  can  induce  as  much  as  a 127. 
decrease  in  average  loss  of  proximity.  Moreover,  the  duplication  ratio 
is  limited  in  these  schemes,  since  we  do  not  use  a deeper  binary  tree 
memory  structure  BTMS  than  is  needed  when  no  duplication  is  used. 

Therefore,  the  duplicating  patterns  defined  in  this  paper  yield  a very 
effective  space-time  tradeoff. 

In  Chapter  5,  it  is  shown  that  duplication  at  low  boundaries  is  as 
important  as  at  high  boundaries  in  preserving  the  average  loss  of 
proximity  in  arrays.  And  the  distribution  of  the  number  of  duplications 
at  the  various  boundaries  must  be  carefully  chosen  in  order  to  achieve 
decrease  in  the  average  loss  of  proximity. 
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APPENDIX  A 


The  proof  of  Theorem  4.2  is  given  as  follows. 
By  backward  induction  on  L 
Basis;  L = S 


FWG(l,s)  generates  wS(i)  * a +1  for  i € I(i,  -a  ) 

3 S S 

and  w^i)  = L + 1 - i for  i € 1(2  ) - I(£  -a  ) 
s s s s 

Since  a ^ 1,  wS(£  -1)  =>  2 
s s 

By  Lemma  4.6,  w(Z  -l,j)  is  set  to  Vs  (l  -ljw3 ( j)  * 2wS(j) 
s s 

j € icy 

Since  w(l)  is  set  to  3/4(a  +1)  and  by  Lemma  4.6, 

s 

w(l,l)  is  set  to  l/2(a  +1)^ 

3 

Now,  we  show  that  w(i),  for  all  i € I (Z  ),  satisfies 

s 

equation  (4.3) 


vr(l)  = 2 • w(l)  • 1/3  + V(2)  • 1/4 


3/4(a  +1) 


r„- 


w(2) 


if  2 <Z  -a 
s s 


2w(2)  *l/4  + w(l)  *1/3  + w(3)  *1/4 

2w(2)  *l/4  + w(l)  *l/3  + w(3)  *l/4  + w(i  )l/4  if  2 -X  - 

s s 

w(l)  *1/3  +TJ(1)  *1/3 
l/2(a  +1)  + l/4(a  +1)  + l/4(a  +1) 

3 S S 

1/2 (a  +1)  + l/4(a  +1)  + 1/4  a + 1/4 
s s s 


a 

s s 


if  2>Z  -a 
a a 


,l/2(ag+l) 


+ 1 


if  2SA  -a 
s s 


Z + 1-2 
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W(i)  - 2W(i)  *l/4  + w(i-l)  •l/4  + w(i+l)  *1/4,  i € l(Z  -a  -1)-I(2) 

s s 

= l/2(a  +1)  + l/4(a  +1)  + l/4(a  +1) 
s s s 

* a + 1 

3 

w(i)  = 2w(i)  *l/4  + w(i-l)  •l/4  + W(i+i)l/4+w(£  ) *1/4,  W -aai<2 

' ' s s s 

=■  l/2(a  +1)  + l/4(a  +1)  + 1/4 -a  + 1/4 
s s s 

= a +1 
s 

w(i)  =-  2w(i)  *l/4  + w(i-l)  *l/4  + w(i+l)  *1/4,  i € I(i  -1) -1(1  -a  ) 

S S3 

- 1/2(1  +l-i)  + 1/4  (X  +l-i+l)  + 1/4(4  +1-1-1) 

s s s 

= 4 + 1 - i 
s 

w(i)  = 2w(i)  *1/4  + w(i-l)  *1/4  ia4_ 

s 

- 1/2 +1/2 

= 1 

Assume  that  L * k,  the  algorithm  correctly  generates  the  weight 
distribution  w^ (i) , i € I(2s‘k£k)>  that  is  wk(i)  satisfies  equation 
(4.2)  for  i € I(2S'kJlk). 


Suppose  i > 0,  that  is  L = k-1,  the  loop  LP  in  algorithm  4.1  has  to  be 

•— lc  s -lc 

executed  once  more  after  w (i)  , i € 1(2  4^)  , are  found. 

Each  BWG(f,k)  generates  the  weights  of  a^  cells  which  contain 

duplicate  copies  of  some  records  and  their  copies  the  next  4^  - a^  weights 

from  w 's.  Thus,  each  BWG  will  shift  the  records  aj^  places  to  the  right. 

— k+1 

This  explains  the  term  d^’a^  in  the  parameter  of  w in  procedures  FWG 


and  BWG. 
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f z 


At  each  (k-1)  boundary,  records  are  duplicated  and  each  of 

s — Ic 

the  2 sections  of  length  Z k is  divided  into  2 sections.  Therefore, 
s - Oc* 

there  are  2 sections,  each  of  length  (ifc  + a p/2  when  the  loop 

LP  is  executed  the  (s  + l-(k-l))th  time.  FWG  and  BWG  are  alternately 


applied  to  each  section. 


FWG(l+(i-l)^k_1,k-l)  generates  wk_1(j),j  € I(i  -I((i-l)j&k  p 


— — » * * . f CV”  i. 

i 6[l,3,...,2S'k+1-l} 

For  j € Ki-Vrvr1)  ' KCi-D-Vi) 

wk"1(j)  - wC'1(j-l)  *1/4  + wk-1( j+1)  *1/4  + 2wk"1(j)  -1/4 

= l/4(4wk(j-dk_1ak_1)) 

For  j = i-J^  - ak_1 

wk_1(j)  = wc"1(j-l)  •l/4  + wk_1(j+l)  •l/4  + wc-1(i*jek  l+l)  *1/4 
+ 2wk-1(j)  -1/4 

- l/4[wk(j-l-dk_1ak_1  + wk-1(j+l)  + 


”k(1'1k-i+1-(\-i+1)ak-l>  - sk'1<1-‘k.rak-i) 
For  j € - i(i-ik.r\.i) 

^■l(J)  - l/4(wk”1(j -l)4wk’1(j+l)+2wk_1(j) ) 

* W'i*k.1«)'(sk'l(i'ik.i-s.1)*lk'I(1'‘k.i-S.i)l 
" 1/<Vi+l),^’1<l-\-rVi)+xk’l(1'\-i‘  Vi» 


w _1(j)  * 1/4*1/(ak.i+1),(^k"1(i‘-ek-rak.i)+xk"1(i,ik-rak.i)> 

The  procedure  BWG(l+i  •X^.k-l)  generates  wk_1(j),s 

for  j € I((l+l)Jik_1)  . i(i.2k_1)  and  i € {2,4 2s*k+1} 

For  j € Ki-Vi+Vl)  • 1<i*\-l+1) 
wk*1(j)  - l/4(i^-1(j-l)  + Wk"1(j+1)  + 2wk-1(j)) 

- l/4(wk(j.l.(dk_1+l)ak_1-wk-1(j-l-ak_1) 

+ ^k(j+l-(dk.1+Dak_1)-wk'1(j+l-ak_1) 

+ 2wk(j.(dk_1+l)ak_1)-2^-1(j-ak,1)) 

- l/4(4^(j-(dk_1+l)ak_1)-4^k-:L(j-ak_1)) 

For  j - i'2,^+1 

^(j)  - l/4(wk”1(j+l)  + 2wk_1(j)) 

- l/4(wk(j+l-(dk_1+l)ak_1)-wk"1(j+l-ak_1) 

+ 2w^(j.(dk_1+l)ak_1).2^-1(j-ak_1)) 

- l/4(4(wk(j-(dkl+l)ak_1)-wk(j-ak_1))) 

For  j - i-Vi  + Vi+i 

^^(j)  - l/4(^k(j-l-(dk_1+l)ak_1)-wk-1(j-l-ak_1) 

+ 2wk0+l-(<lk_1+1)a](_1)+»k'1(j.ak_1-1) 

- ,+l)«.  ,) 


For  j € Kd+D^^.l)  - I(l,j6k-1+«k_1+l) 
i^"l(J)  ” l/4(wk'1(j-l)+wk"1(j+l)  + 2wk'1(j)) 

» l/4(wk(j-l-(dkl+l)ak_1)  + ^k(Jfl-(dk-1+l)«k-1) 

+ 2sk<J-(\-i+1)Vi» 

For  j - (i+1)^  ! 

^C“1(J)  » l/4(wk-1(j-l)  + 2wk'1(j)) 

- 1/4^  (i-l-(dk_1+l)ak_1)  + 2^(j-(dk.1+1)ak.1)) 

- mC^CC^j+DJ^-l)  + 2wk((dk_1+l  )lk) 

In  the  proof,  we  claimed  that  (i+l)^k_1  - (dk_1+l)ak  L = (^  1+1)-ejc* 
It  is  true  because  d^j+l  * (i+l)/2  and  £k_1  - (\+\  j_ )/2 • 
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APPENDIX  B 

The  closed  form  expressions  for  the  cummulative  weights  for  some 
duplicating  patterns  are  as  follows. 

1.  (0,0, .. . ,2S-t) 

a.  t=l 

- (2s+l)2s'i“1,  i - 1,2, .,.,s-l 
W - 1 

s ■ 

b.  t - 2s"1 

Wt  = 2s'i"1(2S"1+lH2S"i"2(23'1+l),  i » 1,2 s-2 

W * 2S+1 
s-1 

W - 1 
s 


c.  t = 2-1 


W = 2" 

W = 1 
s 


s-i+1 


2.  (0,1,1,. ..,2S"1+1) 

W = 1/12  (2n2+4n-8f (-l)s(-2)) 


= l/(3*2i+1)(2n2+4n-6+22i+2i+1(-l)(s"i))  , i = 2, 3,..., s-1 


W = 1 
s 


3.  (0,1, 2,4,4,. ..,7) 


I . i 

i I 


Wx  - 256/3-2s_J 

W2  - 124/5f 128/5 (2s“5+1) 

W3  - 29/3f32/3(2S-5+l) 

W - 16/5 •2S-i”1 , i - 4,5 s-1 

W « 1 
s 

4.  (0,1, 2, 4, 4,. ..,4,0,15) 

Wx  - 2s+^/3-32/3 


■ * . * . ‘ .»  - ^ ♦ 


( 


) 

1 

1 
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W2  = 2*f3/5-8 


W. 


2sH'1/3-22/3 


W.  - 2/5-11/5 
4 


= 2S_i+^/5,  i = 5,6,...,s-2 


W 


s- 1 

W =*  1 


32 


5.  (0 ,1 ,2  ,4 ,4 , • . • ,4 , 1 , 13) 
,s-4 


V * 224/3-2- 


W2  = 112/5-2s'4-22/5 


W3  = 28/3-2S_4-15/3 


W,  = 28/5 -23-5-7/5 
4 


*■  14/5*2S  ^ , i = 5,6, 


,s-2 


W 


s-1 


14 


6.  (0 , 1 ,2 ,4 ,4 , . . . ,4 ,2 , 11) 

,s-4 


W, 


64-2“ 


-1/3 


W2  = 96/5 -2s-4- 7/5 


W3  * 8-2S"4-ll/3 


W.  * 24/5-2 
4 


s-5 


* 12/5-2  , i * 5 ,6  , . . . ,s-2 


W » 8 
3-1 


w 


7.  (0,1, 2, 4, 4, ...,4,3,9) 


160/3-2S“4-5/3 


W, 


